Apple silicon has 128 byte alignment so fix our defines to match #52996

gbaraldi · 2024-01-21T22:27:26Z

Lines 33 to 50 in 8a69745

    
           mutable struct ReentrantLock <: AbstractLock 
        
               # offset = 16 
        
               @atomic locked_by::Union{Task, Nothing} 
        
               # offset32 = 20, offset64 = 24 
        
               reentrancy_cnt::UInt32 
        
               # offset32 = 24, offset64 = 28 
        
               @atomic havelock::UInt8 # 0x0 = none, 0x1 = lock, 0x2 = conflict 
        
               # offset32 = 28, offset64 = 32 
        
               cond_wait::ThreadSynchronizer # 2 words 
        
               # offset32 = 36, offset64 = 48 
        
               # sizeof32 = 20, sizeof64 = 32 
        
               # now add padding to make this a full cache line to minimize false sharing between objects 
        
               _::NTuple{Int === Int32 ? 2 : 3, Int} 
        
               # offset32 = 44, offset64 = 72 == sizeof+offset 
        
               # sizeof32 = 28, sizeof64 = 56 
        
               ReentrantLock() = new(nothing, 0x0000_0000, 0x00, ThreadSynchronizer()) 
        
           end

this probably also needs a fix, and maybe other places as well

vtjnash

Should we also review the uses of malloc_cache_align? This particular field seems to be intended to be the largest sized value that can be loaded in a single instruction, or at least whatever alignment is most optimal for that (in bytes)?

gbaraldi · 2024-01-22T01:31:30Z

I guess we should review these things overall. I'm not sure if it has anything to do with specific instructions because both 64 byte alignment and loads are way to large to matter to instructions and seem to do with caches.

gbaraldi · 2024-01-22T13:59:09Z

I think the goal with malloc_cache_align is to make sure that different mallocs don't end up in the same cache line, so the large alignment asked here makes some sense.

d-netto

Will probably be useful for #52994.

Can we prioritize merging it and backporting it to 1.10?

Thanks.

d-netto · 2024-01-24T12:00:21Z

Some versions of PPC apparently also have a cache line size of 128 bytes: https://reviews.llvm.org/D33656.

I don't know what's the status of supporting PPC, but it might be a good idea to adjust it as well.

vchuravy · 2024-01-24T14:58:56Z

We only support PPC8+ so yeah wes should adjust for that as well l.

) https://github.com/JuliaLang/julia/blob/8a69745bdcb06409ab7e4fc84718f34d7d54a7f9/base/lock.jl#L33-L50 this probably also needs a fix, and maybe other places as well (cherry picked from commit 91ec2bb)

Backported PRs: - [x] #51095  - [x] #52583  - [x] #52645  - [x] #52423  - [x] #52721  - [x] #52637  - [x] #52752  - [x] #52758  - [x] #51375  - [x] #52994  - [x] #53015  - [x] #53032  - [x] #52748 - [x] #52856 - [x] #52878 - [x] #52754 - [x] #52228 - [x] #52924 - [x] #52569  - [x] #52605  - [x] #52618  - [x] #52781  - [x] #53055  - [x] #53096 - [x] #53076 - [x] #52841  - [x] #52078  - [x] #53035  - [x] #53066  - [x] #52996  - [x] #53121 Non-merged PRs with backport label: - [ ] #52694  - [ ] #51479

…iaLang#52996) https://github.com/JuliaLang/julia/blob/8a69745bdcb06409ab7e4fc84718f34d7d54a7f9/base/lock.jl#L33-L50 this probably also needs a fix, and maybe other places as well (cherry picked from commit 91ec2bb)

Apple silicon has 128 byte alignment so fix our defines to match

fb62f73

vtjnash reviewed Jan 22, 2024

View reviewed changes

d-netto approved these changes Jan 22, 2024

View reviewed changes

d-netto added the backport 1.10 Change should be backported to the 1.10 release label Jan 22, 2024

KristofferC mentioned this pull request Jan 24, 2024

Backports for 1.10.1 #52755

Merged

33 tasks

KristofferC merged commit 91ec2bb into master Jan 24, 2024
8 checks passed

KristofferC deleted the gb/m1-alignment branch January 24, 2024 13:20

KristofferC removed the backport 1.10 Change should be backported to the 1.10 release label Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apple silicon has 128 byte alignment so fix our defines to match #52996

Apple silicon has 128 byte alignment so fix our defines to match #52996

gbaraldi commented Jan 21, 2024

vtjnash left a comment

gbaraldi commented Jan 22, 2024

gbaraldi commented Jan 22, 2024

d-netto left a comment •

edited

Loading

d-netto commented Jan 24, 2024

vchuravy commented Jan 24, 2024

	mutable struct ReentrantLock <: AbstractLock
	# offset = 16
	@atomic locked_by::Union{Task, Nothing}
	# offset32 = 20, offset64 = 24
	reentrancy_cnt::UInt32
	# offset32 = 24, offset64 = 28
	@atomic havelock::UInt8 # 0x0 = none, 0x1 = lock, 0x2 = conflict
	# offset32 = 28, offset64 = 32
	cond_wait::ThreadSynchronizer # 2 words
	# offset32 = 36, offset64 = 48
	# sizeof32 = 20, sizeof64 = 32
	# now add padding to make this a full cache line to minimize false sharing between objects
	_::NTuple{Int === Int32 ? 2 : 3, Int}
	# offset32 = 44, offset64 = 72 == sizeof+offset
	# sizeof32 = 28, sizeof64 = 56

	ReentrantLock() = new(nothing, 0x0000_0000, 0x00, ThreadSynchronizer())
	end

Apple silicon has 128 byte alignment so fix our defines to match #52996

Apple silicon has 128 byte alignment so fix our defines to match #52996

Conversation

gbaraldi commented Jan 21, 2024

vtjnash left a comment

Choose a reason for hiding this comment

gbaraldi commented Jan 22, 2024

gbaraldi commented Jan 22, 2024

d-netto left a comment • edited Loading

Choose a reason for hiding this comment

d-netto commented Jan 24, 2024

vchuravy commented Jan 24, 2024

d-netto left a comment •

edited

Loading